Development of a Cantonese-English code-mixing speech corpus
نویسندگان
چکیده
This paper describes the design and compilation of the CUMIX Cantonese-English code-mixing speech corpus. Code-mixing is a common phenomenon in many bilingual societies and it usually involves at least two different languages within one utterance. In Hong Kong, people usually mix English words and phrases with Cantonese in their daily conversation. Although there are many monolingual corpora of Cantonese and English, code-mixing speech database of these two languages is not available. The aim of developing this corpus is to study of the effect of Cantonese accents in English, the design of effective language boundary detection algorithm in code-mixing utterances [1], and evaluation of the performance of code-mixing speech recognizers.
منابع مشابه
Automatic Recognition of Cantonese-English Code-Mixing Speech
Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two different languages in a spoken utterance. This paper presents the first study on automatic recognition of Cantonese-English code-mixing speech, which is common in Hong Kong. This study starts with the design and compilation of code-mixing speech and text corpora. The problems of acoust...
متن کاملAutomatic speech recognition of Cantones
This paper describes our recent work on the development of a largevocabulary, speaker-independent, continuous speech recognition system for Cantonese-English code-mixing utterances. The details of both acoustic modeling and language modeling will be discussed. For acoustic modeling, Cantonese accents in English words are handled by applying cross-lingual acoustic units, as well as modifications...
متن کاملEffects of language mixing for automatic recognition of Cantonese-English code-mixing utterances
While automatic speech recognition of either Cantonese or English alone has achieved a great degree of success, recognition of Canton-English code-mixing speech is not as trivial. This paper attempts to analyze the effect of language mixing on recognition performance of code-mixing utterances. By examining the recognition results of Canton-English code-mixing speech, where Canton is the matrix ...
متن کاملCode-Mixing and Mixed Verbs in Cantonese-English Bilingual Children: Input and Innovation
In both child and adult Cantonese, code-mixing is used productively. We focus on the insertion of English verbs into Cantonese utterances. Data from nine simultaneous bilingual children in the Hong Kong Bilingual Child Language Corpus are analyzed. Case studies show that the children’s rates of mixing closely match the rate of mixing in the parental input, and that different input conditions in...
متن کاملMainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao
As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005